Bilingual Embeddings and Word Alignments for Translation Quality Estimation
نویسندگان
چکیده
This paper describes our submission UFAL MULTIVEC to the WMT16 Quality Estimation Shared Task, for EnglishGerman sentence-level post-editing effort prediction and ranking. Our approach exploits the power of bilingual distributed representations, word alignments and also manual post-edits to boost the performance of the baseline QuEst++ set of features. Our model outperforms the baseline, as well as the winning system in WMT15, Referential Translation Machines (RTM), in both scoring and ranking sub-tasks.
منابع مشابه
Bilingual Word Embeddings for Phrase-Based Machine Translation
We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic simil...
متن کاملConvolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling
Estimating similarities at different levels of linguistic units, such as words, sub-phrases and phrases, is helpful for measuring semantic similarity of an entire bilingual phrase. In this paper, we propose a convolution-enhanced bilingual recursive neural network (ConvBRNN), which not only exploits word alignments to guide the generation of phrase structures but also integrates multiple-level ...
متن کاملBilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification
In many languages, sparse availability of resources causes numerous challenges for textual analysis tasks. Text classification is one of such standard tasks that is hindered due to limited availability of label information in lowresource languages. Transferring knowledge (i.e. label information) from high-resource to low-resource languages might improve text classification as compared to the ot...
متن کاملLearning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation
We propose a simple log-bilinear softmaxbased model to deal with vocabulary expansion in machine translation. Our model uses word embeddings trained on significantly large unlabelled monolingual corpora and learns over a fairly small, wordto-word bilingual dictionary. Given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language...
متن کاملBilBOWA: Fast Bilingual Distributed Representations without Word Alignments
We introduce BilBOWA (Bilingual Bag-ofWords without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-alig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016